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ABSTRACT 

We study a routing problem that arises on SIMD parallel architectures whose communication 
network forms a toroidal mesh. We assume there exists a set of k message descriptors {(z*,^)}? 
where indicates that the i th message’s recipient is offset from its sender by x t hops in one 

mesh dimension, and y z hops in the other. Every processor has k messages to send, and all processors 
use the same set of message routing descriptors. The SIMD constraint implies that at any routing 
step, every processor is actively routing messages with the same descriptors as any other processor. 
We call this Isomorphic Routing. Our objective is to find the isomorphic routing schedule with least 
makespan. We consider a number of variations on the problem, yielding complexity results from 
0(k) to NP-complete. Most of our results follow after we transform the problem into a scheduling 
problem, where it is related to other well-known scheduling problems. 
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NAS1-18605 and NAS1-19480 while the second author was in residence at the Institute for Computer Applications 
in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681-0001. 
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1 Introduction 


The issue of routing messages in a parallel computer network has attracted a considerable 
amount of attention. A host of problem variations exist. For example, some models presume 
that every processor i holds a number and that one wishes to implement some permutation 
(e.g., [6]). Another variation is to assume that each processor i has a list of messages each 
of which is destined for an arbitrary processor, this is known as “all-to-all personalized 
communication” [4]. Our problem is a constrained case of all-to-all personalized communi- 
cation, on an n x m toroidal mesh. It is also a constrained case of the general “compiled 
communication” problem studied in [1], where the problem is to construct a communication 
schedule for an irregular computation. 

To begin with, in our problem, we can always describe a message’s destination in terms 
of the offset in both mesh dimensions X and Y of the source processor. Thus, a pair (x,y) 
describes a message’s routing requirements. Observe however that a message needn’t travel 
exactly x units in the X dimension and y in the Y — because of wrap-around, it may equally 
well choose to travel m — x units in X and/or n — y units in Y. Now imagine a parallel 
computation where every processor performs the same computation, but on different data. 
Further suppose that the pattern of messages every processor sends is the same, e.g., pat- 
terns associated with discretization stencils [7]. We may thus describe the communication 
requirements of the entire computation in terms of the offsets {(zi,2/i),. . . ,(xk,yk)} of the 
k messages a single processor sends. We will say that the n x m different messages with a 
common offset pair are all isomorphic. 

Every processor has four communication ports, referenced as North, East, West, and 
South (N, E, W, and S). We assume the communication links are full-duplex. We are inter- 
ested in SIMD (Single Instruction Multiple Data) architectures, where processors execute 
the same instruction stream in lock-step. Unless the architecture provides special support 
for local indirect addressing (which is much slower even when provided), an implication of 
SIMD processing is that at every instant, the set of messages moving through all ports of 
a common type (e.g., N) are isomorphic. We desire a routing schedule that minimizes the 
time required to complete the communication, i.e., the makespan. 

We will examine variations of the problem, finding they have a surprising range of 
complexities. The variations derive from assumptions concerning how many communication 
ports may be active at a time, and whether a message must be fully routed once it begins 
moving or if it can be temporarily buffered at an intermediate processor. The assumptions 
and associated complexities are given below. 

• One port active at a time: 0(k ); 

• All ports active, temporary buffering allowed: O(itlogit); 
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• All ports active, no temporary buffering: NP-complete. 

Let us now state the problem more formally. In a toroidal mesh of n x m processors, a 
set of messages M = {mi , m 2) . . . , m^.} is to be sent from each of the n x m processors (the 
sources) to some other processors (the destinations). Each message m x is represented by a 
pair of integers (x 2 , yi) giving the relative offset of its destination from its source. Assume 
0 < < m - 1 and 0 < yi < n - 1. We wish to design a schedule so that all messages at 

each processor are sent to (and received by) their destinations in the minimum amount of 
time. Depending on the problem variation, at any time a processor can send one message 
in one of four directions (N, E, W, or S), or at any time a processor can send up to four 
messages, one in each direction. We assume it always takes one time unit for a message 
to traverse one link. We notice that for any message m t ‘ — (x*,y;) there are four possible 
ways to send it, East and North (x,*,t/{), East and South (x 2 -,— (;i — J/i))? West and North 
(— (m _ and finally, West and South (-(m - x z ),-(n - y l )). Because the mesh 

is toroidal, they all reach the same destination. Depending on the problem variation, we 
either assume that a message must be routed to completion in a successive series of steps, 
or that a message’s movement can be fragmented, e.g., one step N, two steps buffered, one 
step W, another step N, and so on. 

For example, in a 2 x 3 toroidal mesh shown in FIG. 1 (a), 3 messages are to be sent, 
they are mi = (1,0), m 2 = (2, 1), and m 3 = (0, 1). Assuming that all ports may be active 
simultaneously, we easily determine that the makespan of the optimal schedule, denoted b^ 
C* is 2 From time 0 to time 1, each processor sends mi East to its destination, m 2 
West, and m 3 North to its destination. From time 1 to time 2, each processor sends m 2 
North to its destination. The schedule is illustrated in FIG. 1 (b). Under our assumption of 
isomorphic message passing, each processor does exactly the same thing at the same time. 
Any time a processor sends out a message on one port, (e.g., N), in the following time step 
a message isomorphic to it is received on the opposite port (e.g., S), save that one unit of 
routing service in one dimension (e.g., Y) has been given. This observation suggests that we 
can approach the scheduling problem in terms of a single processor giving routing service 
to each of its k messages. The schedule for one processor can be shown by the traditional 
Gantt chart as in FIG. 1 (c). 
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FIG . 1. An example. 


A message may travel either direction within a dimension. This allows the possibility of 
schedules that cause a message to “backtrack”, e.g., move 3 units W, and later move 4 units 
E. In the case when temporary buffering is provided at each processor, such a schedule can 
always be improved (at least not degraded) by removing the backtracking loop, whence if 
C^ ax is the minimized makespan for an instance of the isomorphic routing problem, there 
exists a backtracking-free schedule with cost C^ ax . When there is no temporary buffering, 
backtracking may be needed just to keep a message moving until it reaches its destination. 
In the remainder we will confine our attention to backtracking-free schedules. 

The problem defined above can be converted to an equivalent problem similar to the 
open shop scheduling. We are given four machines E, W, N, S, in which E, and W are 
identical in function but give different service times, as do N and S. There are k jobs, 
Each job J{ consists of two tasks X{ and TJ, where X{ can only be executed 
by E or W (but not both, because there is no backtracking), taking or m — time 
units, respectively, and Y x can only be executed by N or 5, taking y % or n — time units, 
respectively. The integers m, rz, x^s and y,’s are as defined in the original problem above. 
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Tasks X{ and Yi cannot be executed simultaneously at any time, but may be broken into 
unit-time slices. However, in some problem variations a job J x may be suspended once it 
has begun execution, a feature that corresponds with a message being buffered en route. 
All task execution periods occur on the same machine. Our goal is to find a schedule to 
execute all jobs so that the makespan or the maximum completion time C max is minimized. 

This problem definition suggests anew group of problems, which we call multi-operation/ 
multi-machine scheduling problems. In the classical multi-operation model [5], each job re- 
quires execution on more than one machine. In an open shop the order in which a job 
passes through the machines is immaterial, whereas in a flow shop each job has the same 
machine ordering and in a job shop the jobs may have different machine orderings. In 
the multi-operation/multi-machine model, instead of having just one machine to perform 
a certain kind of task for a job, there is a back-up machine with the same function and a 
possibly different cost. 

We can distinguish the situations in which a task requires identical service at either 
common function machine, or has different service requirements that depend on the machine. 
Our problem is a special case of the latter. In particular, we assume that for each pair of 
common function machines there exists an integer c (c = m for N-S, c — n for E-W) such 
that a task with demand x x requires units on one machine and c — Xi units on the other. 
In this case we will say that the machines give complementary service. 

In the remainder we will refer to problem variations by the following names. 

Pj: Only one machine (out of all four) may be executing at a time. 

P 2 : All four machines may execute simultaneously, jobs may be suspended, common func- 
tion machines give complementary service. 

P 3 : All four machines may execute simultaneously, jobs may not be suspended, common 
function machines give complementary service. 

P 4 : All four machines may execute simultaneously, jobs may be suspended, common func- 
tion machines give uniform service. 

P 5 : All four machines may execute simultaneously, jobs may not be suspended, common 
function machines give uniform service. 

P 1? P 2 , and P 3 have meaning in the context of the isomorphic routing problem; P 4 and 
P 5 are natural variations of the multi-operation/multi-machine scheduling problem. We 
will establish complexity bounds on each of these problems. 

We organize this paper as follows. In Section 2 , we study the complexity of all the 
problems above, save P 2 . Pi is shown to be 0(k) y while the other variations are shown to 
be NP-complete. Section 3 develops an algorithm for problem P 2 , and Section 4 develops 
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an O(k\ogk ) implementation of the algorithm. Section 5 presents our conclusions. The 
Appendix proves some useful lemmas in detail. 

2 Complexity results for Pi, P3, P4, and P5 

Problem Pi allows only one machine to be executing at a time. The solution is trivial. Step 
through the jobs sequentially, giving exhaustive service to one task, and then the other, in 
each case selecting the machine which serves the task most quickly. 2k comparisons are per- 
formed in the course of selecting machines, giving the algorithm complexity 0(k). While 
not very interesting in the scheduling context, the situation follows from the isomorphic 
routing problem under the constraint that at any step, only one communication port can 
be active. This is a seemingly natural constraint, but is not always required. For exam- 
ple, the Thinking Machines CM-2 is able to communicate on all ports simultaneously [1]. 
Indeed, the problem studied in [1] is similar to ours, in that it seeks to schedule communica- 
tion (albeit irregular, as opposed to our isomorphic assumption) on the CM-2’s hypercube 
communication network. 

Next we show that P3, P 4 and P5 are NP-complete. First consider P 4 , where common 
function machines give uniform service. Assume that machines Mi, M2 are identical, as 
are M3, M 4 . There are k jobs, J], J 2 , . . . , Jk> Each job J{ consists of two tasks X t and Y ty 
where X{ can only be executed by M\ and M2, taking time units on either machine, 
and Yi can only be executed by M3 and M 4 , taking y t time units on either machine. A 
job may be suspended, but may never have both its tasks receiving service simultaneously. 
Our goal is to find a schedule with the minimum makespan C max • We shall next prove 
that whether we allow preemption of tasks or not, the problem is always NP-complete. 
Note that the NP-completeness of this formulation (an open shop scheduling problem of 
identical back-up machines with or without preemption) implies the intractability of all 
general multi-operation/multi-machine scheduling problems. 

Theorem 1 P 4 is NP-complete . 

Proof. Consider the corresponding decision problem, in which given a bound P, we are 
asked whether there is a schedule with C max < B. For any instance of the NP-complete 
problem PARTITION [2], given A = {ai , <Z2, . . . , a*} (positive integers) we construct an 
instance of the decision problem, in which there are k + 2 jobs, x % — ai and 3/j = 0 for i = 
1,2, . . .,fc, Xk+i = x k + 2 - \ 1 a i + 1 and t/fc+i = y k +2 = 0, and finally B - £* =1 a { + 1. 

We claim that there exists A r C A such that a * ~ \ S?=i a i iff there is a schedule 

with C max < B for the instance defined. 

If there exists A f C A such that Y^a % eA' a % = 5 52i=i a i (f° r Rotational simplicity assume 
that A! = {ai , a 2 , . . . , a/J), then we can construct a schedule with C max = B as shown in 
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FIG. 2. Even though the schedule constructed does not preempt any task, it is also feasible 
for the instance that allows preemption since non-preemption is considered as a special case 
of preemption. As a matter of fact, the schedule in FIG. 2 is the best possible since for any 
feasible schedule C max > f| Y^i=\ x i 1 = Z^=i a i + 1 = B. 
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FIG. 2. A schedule with Cmax = B for the instance of the decision problem of Pa. 

If there exists a schedule with C m a X = B, then the two big tasks Xk+i and A^+2 cannot 
be scheduled on one machine since otherwise C max > Xk+\ +%k +2 — T2i = i a *' + 2 > B. With- 
out loss of generality, assume AT+i is scheduled on M\ and X^+2 ls scheduled on A/ 2 . For 
the remaining k X-type tasks X\, . . X^ 7 because Yli=i x i = 1 = 25 — (tffc+i + Z£+2)> 

A/i and A /2 are not idle from time 0 to time B. Without loss of generality, assume tasks 
are scheduled on A/i, and tasks Xh + are scheduled on A/ 2 . We have 
Ya=\ x i = x i — I Z)£=i a «* This is true regardless of whether preemption of tasks is 

allowed or not. So there exists A 1 = {ai, . . a/J C A such that Y2a t eA' a i = 2 a »* * 

Now let us consider P 5 , in which a job’s service must be continuous, and common 
function machines give uniform service. The requirement of continuity does not prohibit 
the tasks from being broken into slices which are independently scheduled, so long as a 
job’s execution is not interrupted. It is easy to see that the proof of Theorem 1 can be 
used without any change to prove the NP-completeness of problem P5 in both cases of 
preemption and non-preemption. Thus we have the additional result: 

Theorem 2 P 5 is NP-complete . 

Now suppose that a job’s service must be continuous, and that common function ma- 
chines give complementary service. We assume that a task can be broken into unit-time 
slices. This formulation corresponds directly to an isomorphic routing problem where we 
require that once begun, a message continues to move at each step, until it reaches its 
destination. It turns, out that this variation is also intractable. 

Theorem 3 P3 is NP- complete. 

Proof. Consider the corresponding decision problem, in which given a bound B, we are 
asked whether there is a schedule with C ma x < B . For any instance of the NP-complete 
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problem PARTITION, given A = {a x , a 2 , . . . , a k } (positive integers) we construct an in- 
stance of the decision problem as follows. Let m and n be two integers much larger than 
£i=i a i + 2 - Given four machines M X ,M 2 , M 3 , M 4 , where M x and M 2 are identical in func- 
tion but give complementary service (a: and m — x for workload x respectively), as do M 3 
and A/ 4 ( y and n - y for workload y respectively). There are k + 4 jobs, J x ,J 2 ,..., J k+4 , 
each of which consists of an X-type task and a F-type task. Let x, = a, and y t = 0 for 
* = 2 ’ • • • > x k+i = m - \ E,ti a, and y k+x = \ E*=i + 1, x k+2 = m - \ E«ti a, 

and y k+2 = n - | E.ti - 1, x k+3 = 1 and y k+3 = | E*=i x k+4 = m - 1 and 
Vk+4 = n ~ 2 EiLj ®»- Finally, let B = E?=i a i + 1 . We claim that there exists A' C A such 
that Ea,€ 4 ' a i ~ \ E,=i a i iff there is a schedule with C max < B for the instance defined. 

If there exists A C A such that Eo igA’ a i ~ 5 Ef=i a i (for notational simplicity, assume 
A' = {a x ,a 2 ,...,a h }), then we can construct a schedule with C max - B as shown in FIG. 
3 . As a matter of fact, the schedule in FIG. 3 is the best possible since for any feasible 
schedule C max > E*=i 4 (min{x,, m - x;} + min { Vl , n - y,})] = EL, a, + 1 = B. 


M, 

M 2 

m 3 
m 4 

0 1/2 (B-l) +1 B 

FIG. 3. A schedule with Cmax = B for the instance of the decision problem of P$. 

If there exists a schedule with C max = then X \ , X2, . . . , X ^ and X^+z must be sched- 
uled on M u X k+u X k+2 ,Xk +4 on M 2 , Y k+x ,Y k+3 on M 3 , and Y k+2 ,Y k+4 on M 4 . Since X k+x 
and U+i can not be executed simultaneously, M 2 executes X k + x at the same time M 3 exe- 
cutes U+ 3. So we say that the executions of X k+x and Y k + 3 are completely parallel. Since 
the executions of X k+3 and U+3 are continuous, so are the executions of X k+3 and Xfc_|_ 4 . 
Similarly, we can show that the executions of X k + 4 and X k4 . 2 are also continuous. How can 
the schedule have X k+3 on M\ and X* + j, X^+2, and X k+4 on M 2 such that the continuity 
°f ^fc+3 and X k +\ and the continuity of X k+4 and X k+2 are both respected? It is not hard 
to see that X^ +3 must be scheduled from time \ Ev=i to time \ ELi a, + 1. Therefore 
set {Xi,X 2 , . . .,X k ) is divided into two sets of equal sums. So there exists A' C A such 
that Eoig/l' a i = 2 E?=i “ ■ 

We are left now with the problem of analyzing P 2 . This will require most of the remain- 
der of the paper. Our approach will be to recognize that P 2 is a variation on a scheduling 
problem, denoted by P^, where the decision of which machine to use for any given task is 
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a given input parameter; the sign of X\ or y x determines which machine to use. In the iso- 
moTphic. routing problem this is equivalent to specifying the specific directions the message 
must travel. This might arise, foT instance, if the N and E ports could be used only for 
sending messages, whereas the S and W ports could be used only for receiving them. 

Assuming that machine usage is pre-specified, the resulting problem P^ is related to a 
paper by Gonzalez and Sahni [3]. The paper studies the general open shop scheduling with 
preemption, and proves that C^ ax - a = max^T*, Lj}, where T % is the sum of execution 
time of all tasks scheduled on machine M x and L 3 is the sum of execution time of all tasks 
of job J y To construct the optimal schedule for any instance with m machines, n jobs, and 
r nonzero tasks, an 0 (r(min{r, m 2 } + mlogn)) algorithm is presented. 

It is easy to see that P \ is in fact a special case of this open shop scheduling problem, in 
which parameters are integers and preemptions are only allowed at the integral points. Fur- 
thermore, we also notice that the minimum makespan for any instance of P 2 > ^mazi ^ ^ least 

a = maXt^jT,, Lj) = max{^Vj;,>o x ii Pvx l <o(~ Xt )' P\/y,>o Vii )i max j { I x t! + 

i 3/ j I } } - When we apply Gonzalez and Sahni’s algorithm to P ' 2 , we have an optimal pre- 
emptive schedule with C^ ax = a. Since all preemptions occur at the integral points, this is 
actually the optimal solution to The time complexity of Gonzalez and Sahni’s algorithm 
when applied to P 2 is 0(fclogA;). 

In view of this result, our approach will be to take a problem instance of P 2 , and 
determine the machine assignments that minimize the a. Gonzalez and Sahni’s algorithm 
may then be applied to construct the actual schedule. 


3 An algorithm for P 2 

As pointed out in last section, solving P 2 can be reduced to the problem of finding the 
task-to-machine assignment that minimizes the makespan. The actual schedule can then be 
determined in 0(H’logA;) time using the algorithm of Gonzalez and Sahni. In this section 
we develop an algorithm that makes the needed assignment. 

We abstract our problem as follows. We are given two sets of items, X = {A \,X 2 , . . ., 
A*,}, and Y = {Yi, Y 2 , . . U}, and nonnegative integers x 1? x 2 , . . Xfc, y\ , y 2 , ■ ■ ■, Vk, m, 
and n, where x, < m - 1 and y t < n- 1 for all V s. We must define a function F : XllY -* IN 
with F(X t ) = Xi or m — x,, and F(Y{) = yi or n — yi such that a = ma x, t j{Ti, Lj} — 
max{«i , a 2 , 03 } is minimized, where 

ai = max{ Xi, ^2 (m — x^)} 

Vi(F(Xi)=x<) V<(F(X,)=m-x,) 

a 2 = max{ ^ yi, ^ ( n ~ Vi)) 

Vi(F(K,)=y,) Vi(F(Vi)=n-«) 
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a 3 - ma x{F(X z ) + F(Y t )} 

Vt 

We first describe an algorithm A that defines a function f : X UY IV with f{X t ) = Xi 
or m — x,', and /(Y) = 2/« or n — yi such that the resulting Qj and Q 2 are both minimized. 
Following this, we look at how F may deviate from /, and show how to modify / so as to 
create F. 

Algorithm A 

1 . Sort xi , x 2 , . . • , x/k and y x , y 2 , ■ ■ • > Vk in nondecreasing order, separately. 

2. Do the following to each sorted list. The pseudo-code below defines fx : X —* FV with 
f(Xi) = x, or m-Xi such that the resulting aj is minimized. To define fy : Y — * IV for 
the minimum a 2 , we simply replace the notations for the A-list by the corresponding 
notations for the Y-list. 

For notational simplicity, assume xi,x 2 ,. . . ,x*, are in nondecreasing order; 
af <— 0; af <— 0; i <— 1; j <— k ; 
while i < j do 

if ctj" + x, < af + (m - Xj) 
then { fx(X{) <— x^ af <— af + xy, i + + } 
else { fx{Xj) <— m - Xj; of <— + (m - Xj); j + + }; 

ay max{o|, aj" }; 

3. / : X U Y —* N is the combination of fx and fy. 

We recognize af as accumulating the first term in aj, and aj" as accumulating the 
second. Given the sorted ordering of the x,’s, the algorithm finds a turning point t, where 
f(Xi) = x t for i < t, and f(Xi) - m - x t for i > t; furthermore, among all such turning 
points the one chosen minimizes max{a^,a^}. That this algorithm defines a^ follows from 
the fact that the optimal schedule must have this structure, for suppose not. Assume there 
are p and q with 1 < p < q < k such that f(X p ) = m - x p , and f{X q ) = x q . Since 
x p < x, and m - x p > m - x q , it follows that max{m - x p ,x,} > max{x p ,m - x q ), so that 
changing the assignment for X p and X q does not increase aj. We may apply this argument 
repeatedly until the resulting assignment exhibits a turning point, as claimed. 

FIG. 4 shows an example of using algorithm A to compute the optimal value of aj . The 
numbers in the circles are the values of function fx of the corresponding tasks. We also 
illustrate af and af as functions of index, even though the algorithm will not generate 
all such values we display. From now on, we shall use the diagrams similar to FIG. 4 but 
without the a] 1 ", af values to represent the definition of /, which we will also call assignment 
diagrams. 
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FIG . An example of using algorithm A to compute a*. 


We see from the above discussion that a* and the optimal values of a\ and 02, can 
be obtained by algorithm A in time O(kfogk), while the optimal value of a 3 , can be 
obtained by choosing min{a;,, m - £,} for task X, and minf?/,, n — j/,} for task Y t . However, 
the difficulty we face is that these optimalities may not be achieved at the same time, i.e., 
the assignment minimizing and may not be consistent with the assignment minimizing 
a 3- To highlight the differences we will say that f{X t ) (alt., /(K)) is a bad choice if /(A,) ^ 
min{xi,m - x t } (alt., f(Y,) ^ min{j/ t , n - ?/;}), and that f(X t ) (alt., /(>')) is a disastrous 
choice if f(X t ) / min{x,-, m - a:,} (alt, f(Y,) min{?/ t , n - &}) and f(X t ) + f{Y{) > a*, 
where a* = max{arj, a 3}- In the example in FIG. 4, the shaded circles represent the bad 
choices. It is easy to see that bad choices always form a contiguous block which includes 
the turning point. Without loss of generality, assume that the block of bad choices is in 
the left column of the assignment diagram and ends at the turning point. We observe that 
if / contains no disastrous choices, then a * = max{c*J, ttj. « 3 } > maxima?;} > 03, and 
F = /. Should / contain disastrous choices, we need to consider modifying it in order to 
find the function F with the minimum a. 

Let us assume then that we have computed an assignment / by applying Algorithm A 
to the X list (and so find the X assignment function fx), and to the Y list (and so find 
the Y assignment function fy), and have identified at least one disastrous choice. / may or 
may not be the optimal assignment F. We have developed a number of results that help us 
to identify jobs J, for which it may be possible that /(A' t ) ^ F(X t ) or f(Y{) / F(F;). Most 
importantly, these results severely constrain the number of tasks whose assignment in / can 
differ from their assignment in F. Given /, we will identify a set of possible assignment 
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switches to consider; the least-cost assignment among these will be the optimal assignment. 
We show that for the given /, only 0(k\ogk) alternative assignments must be considered, 
whence the optimal assignment is found in 0(k\ogk ) time. 

We proceed now by making some definitions, and stating certain results founded upon 
them (proofs are relegated to the Appendix). Without loss of generality, we assume m > n 
in the remainder. 

Given /, let Bx and By be the sets of bad choices in fx and /y, respectively, and Dx 
and Dy be the sets of disastrous choices in fx and /y, respectively. Now, in the assignment 
diagrams of fx and /y, let Xi and Yi be the sets of choices in the left columns of fx and 
/y, respectively, and Xr and Yr be the sets of choices in the right columns of fx and 
/y, respectively. We denote the sets of tasks whose assignment differs under / and F as 
Ux Q Xl, Vx c Xr , Uy c Y l , Vy C Yr. We use a\{Ux,V x ) (alt. a 2 {Uy ,Vy)) to denote 
the corresponding ct\ (alt., a 2 ) resulting from the switches in Ux,Vx (alt. f/y, Vy). Finally, 
we will say that assignment f{Xi) (alt., f(Yi)) is a potential switch if either f{Xf) (alt., 
f(Yi)) is a disastrous choice, or f{Xf) (alt., f(Y{)) is a bad choice while f(Y t ) (alt., f(X{)) is 
in Vy (alt., VA), and f(Xi) + n-f(Yi) > a 2 (Uy , Vy) (alt., m- f{X x ) + f{Yf) > a^Ux.Vx))- 

The next three results serve to constrain the number of switches we must consider. 

Lemma 1 If\Bx | > 3, then F = /. 

Lemma 2 \Dy\ < 2. 

Lemma 3 \Ux\ > \Vx\ and \Uy\ > |Vy|, . 

Lemma 4 All members of Ux and Uy are potential switches. 

Now consider the implications of these results. By Lemma 1 we only have to worry about 
situations when \Bx\ < 2. By Lemma 4 we know that Ux contains only potential switches, 
which are recognizable bad choices. There are at most 4 different combinations of changing 
or not changing the assignments of bad choices in the left column of fx • By Lemma 3 we 
know that at most two assignments in the right column of fx may change. For each fixed 
combination of changes to /^’s left column we need consider no more than 0(( ^)) pairs of 
possible changes to assignments in /x’s right column. We also need to consider possible 
changes to fy . Lemma 2 tells us \Dy\ < 2; Lemma 3 tells us |Vy | < \Uy\\ Lemma 4 tells us 
that Uy may contain only potential switches, which again are either disastrous choices in /y, 
or bad choices f(Y t ) with f(X t ) £ Vx- It follows that |Vy| < \Uy\ < \Dy\ + \Vx\ < 4. This 
means that for every fixed combination of switched/non-switched assignments of potential 
switches in the left column of fy , we need consider no more than all switched/non-switched 
combinations of four good choices from the right column of fy. There are 0((J)) of these. 
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Considering all combinations of possible changes to fx and possible changes to fy requires 
time 0(@ •($)) = 0(fc 6 ). 

We describe this algorithm for problem P 2 formally as follows. 

1. If m < n, rotate the mesh by 90 degrees, and redefine the parameters in the new 
coordinate system. 

2. Use algorithm A twice to define / which minimizes max{a 1 ,a 2 } 

3. If the block of bad choices in an assignment diagram is not in the left column, rotate 
the diagram by 180 degrees, and exchange the roles of the two machines involved in 
the assignment. 

4. If there are no disastrous choices in both fx and fy, let F be f and go to step 6. 
Otherwise continue in step 5. 

5. List all possible definitions of Ux,Vx and Uy,Vy. For each possible combination of 
Ux,Vx,Uy,Vy, compute its a. Let F be the function determined by the Ux,Vx,Uy, 
Vy which together result in the smallest a. 

6. Use Gonzalez and Sahni’s algorithm to construct the schedule with C mai = a. 

In this algorithm, steps 1, 3, and 4 each take 0(k ) time, while steps 2 and 6 each take 
0(k log k) time. We also know that for step 5, even if we use the brute-force method of 
checking all possible combinations of Ux,Vxi Uy ,Vy , the time needed is still polynomial, 
0(k e ). In the next section, we shall show that step 5 can in fact be implemented in time 
O(fclogfc), thus yielding an O(k\ogk) algorithm for P 2 . 

4 An 0( k log ft) implementation of the algorithm 

The previous section demonstrated that the routing problem has polynomial complexity. We 
can drive the asymptotic complexity to O(klogk), but at the price of tremendous compli- 
cation in the algorithm. Our results may be primarily of theoretical interest; our algorithm 
can be implemented, but suffers from a lack of elegance. One hopes that additional work 
on the problem may yield a more intuitive solution. 

Let us now consider the following three cases| \B X | = 0, \Bx\ = 1, and \Bx\_ — 2. 
We shall prove that in each case the function F, which minimizes a, can be obtained in 
O(k\ogk ) by switching some assignments in the function /. We will use the next three 
lemmas to help reduce the number of possible combinations we must consider. Their proofs 
can be found in the Appendix. 

Lemma 5 If \B X \ = o, then \Ux\ - \Vx\ = o and \Dy\ < 2. 


12 


Lemma 6 If \Bx\ — 1, then \Vx\ < |f/x| < 1 and \Dy\ < 2. Furthermore , if Dy = 
{/(Vi), /(V^)}, then a* < f(X\) + /(X 2 ) and one o//(Xi) and f(X 2 ) is the largest bad 
choice in fx . 

Lemma 7 //|Bx| = 2, then \Dx\ < 1 and \Dy\ < 1. Furthermore , if D\ = {f(X\)}, then 
Dy = {/(Vi)}; if Dy {/(Vi)} and f(X 1 ) £ 5*, then f{X\) must be in the right column 
in the assignment diagram of fx . 

We first consider Case 1: = 0. 

By Lemma 5, Ux = Vx = <l>. Since Vx = only disastrous choices in /y can be 
potential switches for Uy. We consider two subcases: (a) \Dy \ — 1; and (b) |Z)y| = 2. 

(a) If Dy = {/(Vi)}, then /(Vi) is the only potential switch in fy. Consider the 
following possible combinations of Ux, Vx and Uy^Vy, each of which determines a feasible 
definition of F , and choose the one with the smallest a to be F. The entire process takes 
0(k) time. 


# 

Ux 

Vx 

U Y 

Vy 

Time 

1 

4 > 


4> 

<t> 

0(1) 

2 

4 > 


{/(U)} 

4 > 

0(1) 

3 

T 

~<t> 

{/(U)} 

{f(Yi)},Vf(Yi) G Yr 

Oft) 


(b) If Dy = {/(Yi),/(y 2 )}, then f{Y\) and f{Y 2 ) are the two potential switches in /y. 
Without loss of generality, assume f(X 1 ) + f(Y\) > f(X 2 ) + f(Y 2 ). This means that if 
\Uy\ — 1, it must contain f(Yi), not f(Y 2 ). Consider the following feasible definitions of F. 


# 

Ux 

Vx 

U Y 

Vy 

Time 

1 

4> 

4> t 

<P 


0(1) 

2 

<t> 

4> 

{f(V 1 )} 

<t> 

0(1) 

3 

<i> 

<k 

{/(*)} 

{/(y,)},v/(y t ) g y r 

0{k) 

4 

<t> 

4> 

{f(Yi),f(Y 2 )} 

<t> 

0(1) 

5 

4> 

<t> 

{/(U),/(y 2 )} 

{/(K)>, V/(K) G Yr 

0(k) 

6 

<t> 

4> 

{7(u),/(y 2 )} 

{f(Yi),f(Yj)},Yf(Yi),f(Y : j) G Yr 

0(k log k ) 


In the sixth situation, if we check all combinations of f(Yi),f(Yj) G Yr for Vy, there 
will be 0(k 2 ) possibilities. However, not all combinations need to be examined. Our goal is 
to choose f(Yj) 6 Yr for each fixed f(Y{) € Yr so as to minimize max{o' 2 (Cy, Vy),/(Xj) + 
n - /( y j)}» where a 2 (U Y , Vy) =? a* 2 + 2n - f(Y x ) - f(Y 2 ) - f(Yi) - f(Yj). First, sort in 
time O(Hogfc) all f(Y 3 ) G Yr according to the value f(Xj) + n - f(Yj) nondecreasingly. 
Then in the sorted list discard those choices no greater than their left neighbors, yielding 
a list of /(U)’s sorted by nondecreasing f(Xj) + n — /(Yj) and nonincreasing aj + 2n — 
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f(Yi) - f(Y 2 ) - /(Yi) - f(Yj) for any fixed f(Y,). This step takes 0(k) time. Finally, for 
each fixed /(Yi), perform a binary search in the list to locate the f(Yj ) with the minimum 
ma x{a* 2 + 2n - /(Yi) - /(Y 2 ) - f(Yi ) ~ f(Yj ), f(*j) + « “ taWn g 0(k\og k ) for all 

f(Yi)' s. We can then check the |Y/j| feasible definitions of F with Vy = {f(Y l ),f(Yj)} as 
defined above. 

Case 2: | Bx | = 1. 

By Lemma 6, when Dy — {/(Y]'),f(Y 2 )}, one of /(A j) and f(X 2 ) must be a bad choice 
in fx , which implies that it is a disastrous choice. Therefore, if \Dy\ = 2, then \Dx\ — 1- 
We consider four subcases: (a) \Dx\ = 0 and \Dy\ = 1; (b) \Dx\ = 1 and \Dy \ = 0; (c) 
|D^| = 1 and \Dy\ = 1; and (d) \Dx\ = 1 and \Dy \ = 2. We notice that in any situation 
with Vx — 4 >, only disastrous choices in fy can be potential switches for Uy , and that 
whether \Dy \ — 0 or 1 or 2, we can use the same method as in Case 1 to determine F in 
0{k\ogk) time. Let us now assume |Vx| = 1, i.e., Vx = {f(X,)},Vf{Xi) £ Xr, which also 
implies Ux = Bx- 

(a) If Dx = 4>, and Dy = {/(Yi)}, then f(Xi) can not be a bad choice. Assuming 
B x = {f(X 2 )}, we have Ux = {f(X 2 )}. Because f(X 2 ) is a potential switch that is not a 
disastrous choice, we have /(Y 2 ) £ Vy and f{Y r ) £ Uy. Note that /(Yi) may also be in Uy 
if f(Yi) is a bad choice. Consider the following feasible definitions of F. 


# 

Uy 

Vy 

Time 

1 

{f(Y i)} 

{f(Y 2 )} 

0(k ) 

2 

{/(*), /(Yi)},i* 1 

{/(Yi)} 

0{k ) 

3 

{/(Yi),/(V)},^ 1 

{f(Y 2 ),f(Y J )},\/f(Y J )£YR,j 7 i2\ 

0(k log k ) 


In the second and third situations, we only need to check those feasible definitions 
of F with Vx = {/(A/)}, fo r which f( Y i) € By and m - f(Xi) + f(Y , ;) > a x (Ux, V x ) = 
a\ + m- f{X 2 )- f{Xi). In the third situation, we can avoid checking all 0(k 2 ) combinations 
of f(X t ) £ X R with i ^ 1 and f(Yj) £ Yr with j ^ 2 by using the same method developed 
in the sixth situation of subcase (b) in Case 1. 

(b) If Z?x = {/(Xi)}, an d Dy = $■> then Dx = {/(Xi)}, an d Vx = {f{Xi)}, Vf(Xi) £ 

Xr. Consider the following feasible definitions of F. 


# 

Uy 

Vy 

Time 

i 

<t> 

4> 

0{k) 

2 

{/(Yi)} 

<t> 

0(k) 

3 

{/(X)} 

{/«)}, V/(X) € Xr 

O(fclog k ) 


In the second and third situations, we only need to check those feasible definitions of 
F with Vx = {/(X,)}, for which /(Vi) € By and m - /(X,) + f(Y t ) > ai(Ux,Vx) - 
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a*i+m- f(Xi) - f(Xi). In the third situation, we use a method similar to that in the sixth 
situation of subcase (b) in Case 1 to avoid checking all 0{k 2 ) combinations of /(A”,) € Xr 
and f(Yj) 6 Y R . The only differences are that a 2 {Uy,Vy) = a* 2 + n - f(Yi) - f(Yj) and 
that if /(Yi) G Yr we use m - f(X\) + n — /(Yj) instead of /(Xi) + n — /(Yi) for choice 
/(Yj) in the sorting part. 

(c) If Dx = {/(A r ])}, and Dy = {/(Y/ 4 )}, where h is a fixed index equal to or not equal 
to 1, then U x = {/(X i)}, \'x = {/(A,)}, V/(X t ) € Xj*. Note that /(Y;) G f/y only when 
f(Yi) G By and m — /(X,) + f(Y{) > ct\{Uxi Yx) = <*1 + m — /(X]) — /(X,-). Consider the 
following feasible definitions of F. 


# 

U Y 

Vy 

Time 

1 



0(k) 

2 

iW)} 

4> 

0(k) 

3 

if(Y,)} 

mmmYj) e y r 

O(k\ogk) 

4 

{/(U)} 

$ 

0(k ) " 

5 

if(Yk)} 

{f(m,mY } )eY R ,jti 

O(k\ogk) 

6 

if(Yh),f(Y t )},i / h 

4> 

0(k ) 

7 

if(Yk)J(Yi)},ith 

{/(Y,)},V/(^)GY R 

0(k log k) 

8 


XnYj)j<yi)Wf{Yi),M)<L y r 

0(k log k) 


The reason why j ^ i in the fifth situation is that if f(Y t ) G Y H and we let Vy = {/(V-)}, 
then m - /(X;) + n - f(Y{) > /(Xj) + /(Y/J > max{/(X t ) + / (Yj) , / (X h) + /(%)}, which 
indicates the resulting assignment is even worse than the original assignment without any 
switches. The method used in the sixth situation of subcase (b) in Case 1 can be applied to 
the third, fifth and seventh situations in this subcase to achieve the 0(fclogfc) bound. In the 
eighth situation, if we check all combinations of /(X;) G Xr and /(Fj),/(Y/) G Yr, there 
will be 0(A; 3 ) possibilities. We will show that not all combinations need to be examined. 
Our goal is to choose /(Yj),/(Y/) G Yr for each fixed /(X.) G X R so as to minimize 
max{a 2 (f/y,Y y ),/(X J ) + n - /(Y,),/(X,) + n - /(Y,)}, where a 2 (U Y , Vy) = a* 2 + 2 n - 
f(Y\)-f(Yi)-f(Yj) — f(Yi). Without loss of generality, assume f(X } ) + n — f(Yj) > f(Xi) + 
71 ~ f{Yi)- First, sort in O(k\ogk) time f{Y 3 ) £ Yr according to the value f{X 3 ) + n - /( Yj ) 
nondecreasingly. Second, in the sorted list, for each /(Y,), except the first one, let f{Y{) 
be the largest choice among those on the left side of f(Yj). This can easily be done in 
0(k) time. Now, we have a list of \Yr\ - 1 choice pairs /( Y,), /( Y/) ordered according to 
the value /( Xj) + n — f(Yf) nondecreasingly. Third, in 0(k ) time discard those pairs with 
their sum f(Yj) + /(Y/) no greater than that of their left neighbors in the list. Finally, 
for each /(X t ) G X R , use binary search to find the pair f(Yj),f(Yi) with the minimum 
max {«2 + 2 n - /(Y,) - f(Y t ) - f(Y } ) - /(Y/), /(X,) + n - f(Y 3 ), /(X,) + n - f(Y,)} among 
the remaining pairs in the list, which altogether takes 0(A;logA:) time. In the above process, 


15 


if f(Yi) £ Yr, use m - f(X x ) + n - f(Y x ) instead of f{X 1 ) + n - f(Y x ) for choice f(Y x ) in 
the sorting part. 

(d) If D x = {f{X i)}, and Dy = {/(Yi), f(Y 2 )}, then by Lemma 6 f(X x ) + f(X 2 ) > «*, 
and f(X i) and f(X 2 ) are in the different columns of fx ■ Since f(X 1 ) is the bad choice, 
then f(X 2 ) € Xr. By assumption Ux = {f(X 1 )}, and Vx = {f{Xi)},Vf(Xi) £ Xr. We 
notice the following properties of the feasible definitions of F. 

First, for the situation in which Vx = {/(X 2 )}, the number of feasible definitions 
we need to check is bounded by 0{k\ogk ) time. In the following discussion, we assume 
Vx = {f{Xi)}, where i # 2. 

Second, we do not need to consider those situations where Vx = {f(X ,)}, for which 
f(Yi ) £ By. Assume Vx = {/(Xi)}, for which i ^ 2 and f(Yi) £ By. We have f(X 2 ) + 
Z(y 2 ) > a* > a; > /(Y,) + /(y 2 ) + /(Yi), therefore /(X 2 ) > /(W) + /(Yi)- We also have 
/(X 2 ) + /(y 2 ) > a* > a] > f(X 2 ) + f(Xi), therefore f(Y 2 ) > f(X t ). We can show that 
m- f(Xi) + n- f(Yi) > f(X\) + f(Y\), because f(X x ) < m-f(X 2 ) < m- f(Y x ) - /(Yi) < 
m — f(Y\) — f(Yi) + n — f(Xi). We can then show that m-/(Xi) + 7i-/(Y;) > f(X 2 ) + f(Y 2 ), 
because f(X 2 ) + f(X{) < a* < /(Xi) + f(Y\) < f{X 1 ) + f{X 2 ) — f{Y t ) < m — f(Y t ) < m 
min{/(y 2 ),/(y t )} < m-min{/(y 2 ),/(v;) + n-max{/(y 2 ),/(y i )} = m + n - /(Y 2 ) - /(*)• 

This means that m - f(Xi) + f{Y t )>m- /(X,) + n-f{Y)> ma x{/(Xi) + f(Y x ),f(X 2 ) + 
/(y 2 )}, which indicates that whether we switch /(Yi) or not the resulting assignment is 
always worse than the original assignment without any switches. 

Taking the above facts into account, we only need to consider the following feasible 
definitions of F. Without loss of generality, assume h = 1 or 2, where f(X h ) + f{Y h ) = 
max{/(Xi) + f(Yi),f(X 2 ) + f(Y 2 )}. This means that if \Uy\ = 1 then Uy = {/(%)}. 


# 

Uy 

Fy 

Time 

1 

<t> 


o(fc) 

2 

{/(n)} 

4> 

0(k) 

3 

U(Y h )} 

mm^fweYRjti 

O(fclog fc) 

4 

{W ),/(y 2 )} 

<t> 

0(fc) 

5 

{/(U),/(U)> 

{/TOJ.V/TO 6 Yr, i # i 

0(A;log A;) 

6 

{/(V.),/(Vj)> 

{/(ISO./Wl.V/ra./W 6 Yr,j,Z ^ i 

0(& log k ) 


Similar to the previous subcases, the number of situations we need to check in this 
subcase is also bounded by 0(k logfc). 

Case 3: | Bx\ = 2. 

By Lemma 7, \D X \ < 1 and \Dy\ < 1, and if there is a disastrous choice in f x , there is 
also a disastrous choice in fy. We consider two subcases: (a) \D X \ = 0 and \Dy \ = 1; and 
(b) \Dx\ = 1 and \Dy\ - 1. 
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(a) If Dx = (f> , and TV — {/(Yi)}, then by Lemma 7 /(A/) € X#, and Vy, if nonempty, 
only contains /(Y;) with /(AT t ) ^ 5^. Otherwise, /(Xj)+n — /(Vj-) > m— /(A\) + n-/(Yt) > 
}{X\) + /(Yj), which implies that the resulting a is even larger than that for /. So there 
is no potential switches in fx • Consider the following feasible definitions of F. 


# 

Ux 

Vx 

Uy 

Vy 

Time 

1 

4> 

4 > 


4> 

0(1) 

2 

4> 

4 > 

{/(F)} 

<t> 

0(1) 

3 

* 

4> : 

{/(F)} 

{/(F)},V/(F) € Fr with /(A,) i B x 

0{k) 


(b) If Dx = {/(Ah)}, and Dy = {/(Yi)}, then we notice the following properties of the 
feasible definitions of F. 

First, we do not have to consider the situation in which both /(Ah) and f{Y\) are 
switched. Because assuming /(Ah) is the other bad choice in /*, m — /(Ah ) + »-/(F)< 
f(X\ ) + n — f(Yi) < /(A i) + f(X 2 ) < a*, which suggests that switching just /(F) is 
already good enough, why bother to switch both /(A i) and /(F)? 

Second, \Ux\ < 1. Assume Ux = {/(Aj),/(A 2 )}. This case happens only when 
/(F) € Vy, /(A 3 ) e V x for some /(F) 6 IV, and /(A 2 ) + n - /(F) > a 2 (Uy, Vy) = 
+ » - /(F) - /(F). Then /(F ) + /(F) < «2 < 2 ) + /(F). So /(F) < f(X 2 ). On 

the other hand, /(JF) + /(F) > a* > f{X\ ) + f(X 2 ). So /(F) > f{X 2 ). A contradiction! 

Taking the above facts into account, we only need to consider the following feasible 
definitions of F. 


# 

Ux 

Fy 

Uy 

Vy 

Time 

1 

<t> 

4> 

<f> 

<t> 

0(1) 

2 


4> 

{/(F)} 

4> 

0(1) 

3 

4> 

4> 

{/(F)} 

{/(F)},v/(F )ev R 

0(k) 

4 

if{X 2 )} 

4> 

{/(F)} 

{/(F)} 

0(1) 

5 

{f(X 2)} 

{f(Xi)}Xf(Xi)€X R 

{/(F)} 

{/(F)} 

0(k) 

6 

{/(*l)} 

4> 


<t> 

0(1) 

7 

{/(*l)} 

if(Xi)}Xf(Xi) € X R 

4> 

<t> 

0(k) 

8 

{/(*l)} 

{/(Y»,V/(A,)€ X r 

{/(F)} 

4> 

0(k) 

9 

{/(*l)> 

{f{Xi)}Xf{Xi)zX R 

{/(F)} 

{/(F)},V/(F) G Yr 

0(k ) 


We check the fourth and fifth situations only when f(Y 2 ) € Yr , and f(X 2 ) + n — /(F) > 
a l(Cy, Vy) = a 2 + n — /(F) — /(F)- la the eighth and ninth situations, we only need to 
check those combinations with Vx — {/(A - ,)}, for which /(F) € By and m- /(A';)+/(F) > 
(Uxi Vx) — aj + m — f(X\) — /(A;). We shall prove that there is at most one such /(A,). 
Assume there are two, say, /(A;) and f(Xj). Then m - /(A,) + /(F) > «i + m - /( Aj) - 
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f(Xi). So /(Ah) + f(X 2 ) <a;< f(X l ) + f(Yi ). Therefore f{X 2 ) < f(Y t ). Similarly, we 
have f(X 2 ) < f(Yj). However, f(X,) + f(Y,) > a* > a* 2 > f(Y\) + f(Y t ) + f(Yj)> We have 
m > /(Ah) > f(Yi) + f(Yj) > 2 f(X 2 ). So /( X 2 ) < y, which is a contradiction to that 
f(X 2 ) is a bad choice* In the eighth situation, we spend 0(k ) to find the /(Ah) G Xr , if 
it exists, and 0(1) to check the corresponding situation. In the ninth situation, we spend 
0(h) to find the /(A/) € A"/*, if it exists, and spend 0(k ) to check the feasible definitions 
with V Y = {f(Yj)hVf(Yj) G Y r . 


5 Conclusion 

This paper studies a problem of routing messages on an SIMD parallel architecture whose 
processing elements (PE) are connected as a toroidal mesh. In our problem the sets of 
messages processors send are isomorphic, meaning that if some processor i has a message to 
send which must traverse x x PEs in the East-West dimension and y t PEs in the North-South, 
then all PEs have a message to send with identical routing offsets. We examine variants of 
the problem having differing assumptions concerning simultaneous use of communication 
channels, and the ability to buffer a message temporarily en-route. Our solution approach 
is to view the problem as a scheduling problem, related to a previously studied open shop 
scheduling problem. Our results provide new results not only on the motivating routing 
problem, but on a new class of scheduling problems as well. 

A spectrum of complexities are obtained, from linear in the number of messages ( k ) per 
processor to NP-complete. The variation where all ports may be used simultaneously and 
messages may be buffered en-route is of particular interest; we first show quickly why the 
problem has a polynomial solution, and then do an extensive case analysis to show that 
the complexity is O(fclogfc). The case analysis lacks elegance; our hope is that future work 
may provide a more direct solution to the problem. 


Appendix 

Lemma 1 Ij\B\ \ > 3, then F = /. 

Proof. We shall prove that D \ = <p and Dy = 4 > • Suppose that f(X i), /( X 2 ), f(X 3 ),. . . 
are the bad choices in fx , and that f(X\ ) is the largest among all. Assume that there is at 
least one disastrous choice in fx (alt., /y), say, /(A" t ) (alt., /(U))- Then /(X,) + /(Vi) > 
a* > > /(X,) + /(X 2 ) + /(X 3 ). So f(Y t ) > (/(X t ) - /(X,)) + /(X 2 ) + /(X 3 ) > 

f(X 2 ) + /(X 3 ) > 2 x | = m, which is impossible. ■ 

Lemma 2 |£>y| < 2. 
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Proof. Assume Dy = {f(Yi)^ f(Y 2 ) y f(Yf), * • *}• Then at least two of f(X\)> /(X 2 ), 
f(X 3 ) are in the same column in the assignment diagram for /x, say, /(Xj) and f(X 2 ). 
Since both f(Y\) and f(Y 2 ) sire disastrous choices, we have /(X 1 ) + f(Y\) > a*, and 
/(X 2 ) + f(Y 2 ) > a*, therefore, 

/(x 1 ) + /(r 1 ) + /(x 2 ) + /(y 2 )>2a*. 

However, we know /(Xi) + /(X 2 ) < a* < cv*, and /(Vi ) + f(Y 2 ) < a \ < a*, therefore, 

/(XO + /(X 2 ) + f(Y,) + f(Y 2 ) < 2a*. 

This is a contradiction. ■ 

Lemma 3 \Ux\ > \Yx\ and \U Y \ > \V Y \. 

Proof. We only prove \Ux\ > Wx \ ? since the proof of \U Y \ < \V Y \ is totally symmetric 
and hence can be omitted. For notational simplicity, we ignore the subscript X in the 
discussion below. 

Assume \U\ < \V\. Define any V' C V with \V f \ = \U\. Let a x {U y V) be the corre- 
sponding ot\ resulting from switches in U and F, and ot\(U y V r ) be the corresponding a 1 
resulting from switches in U and F'. Let a* = max{a+,a^}, where a^" = Ylf{X t )=xi 
and af = ~ **)• 

a,(U, V ) = max{a+ - £ t/ /(X,) + £ v (m - a f - Zv /TO + Ei/(™ - /TO)} 

= «f-Ei//TO + Ev(™-/TO) 

(Since aj 1- - a]" > m(|t/| - |V|).) 

ai (U, V ') = max{a+ - El/ /TO + Ev'(™ " /TO), " E v' /TO + Et/(™ ~ /(A%))} 

= «J-Ey/TO + Ev'(™-/TO) 

<«TOEi//TO + Ev(™-/TO) 

(Since a~ - a+ < Ek-V'C 771 “ /(-Xi)) > “l"-) 

= <*i(V,V). 

Therefore, a\(U,V') < ot\(U, V), and it has fewer bad choices. Why not try a\(U, K 7 )? 
In other words, the choices in V — V r do not have to be switched to the opposite column 
since this does not lower ai, and instead creates some new bad choices. ■ 


Lemma 4 All members of Ux and Uy are potential switches. 

PROOF. As declared earlier, we only prove the lemma for Ux , and omit the subscript X. 
If \U\ — |F|, and U contains some choices that are not potential switches, let U t be the set 
of potential switches in {/, and V 9 be any subset of V with \V f \ = \U'\. 
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ai{U,V) = max{«| - El/ /(*i) + Ev( m -/(*»))> <*i “ Ev /(**') + Ei/( m _ f (%*))} 

= <E - El/ f(Xi) + Ev(m - /( Xi)) 

ot\ (U', V') = max{of — Ei/» f{Xi) + Hv( m ~ — Ev» f(Xi) + 1 Zu'( m ~ f(Xi))} 

= a* — El/' f(Xi) + Ev'( m ~ f(Xi)) 

<aJ-E JW + Ev(»»-/W) 

(Since El/-l/' f(Xi) < Ev-V'( m _ /(^«))-) 

= ax(U,V). 

Therefore, ai(f/', V') < ai(l/,V), and U' does not contain any unnecessary switches. 

If \U\ > |H and U contains some choices that are not potential switches, let V be the 
set of potential switches in U, and V' be any subset of V with IP'I = max{0, \V\-\U -U'\}. 

ai (U, V) = max{a+ - El/ /(*i) + Ev(™ - /(X,), «7 - Ev /(*.) + E i/(™ - /(X,))} 
= «7 - Ev /(*i) + E u(m - /(X,)) 

(Since — af < m(\U\ — |V|).) 

a, (U', V') = max{a+ - El/' /TO + Ev'(™ - /TO), a T ~ Ev' /TO + E U'(™ - /(X,))} 

= «r - Ev' /to + Ei/'(™ - /To) 

(Since ct* — af < — |E|).) 

< - Ev f(Xi) + El A™ - /(X,)) 

(Since Ev-V' f(Xi) < Ei/-l/'( m ~ /(^«))0 
= oi(^n 

Therefore, ai(C/', V') < «i((/, V), and U' does not contain any unnecessary switches. ■ 


Lemma 5 If \Bx\ = 0, then \U X \ = |EH - 0, and |ZV| < 2. 

Proof. By Lemma 3 and Lemma 4, |Vx| < \U X \ = 0- By Lemma 2, \Dy\ <2. 1 

Lemma 6 If \B X \ = 1, then \V X \ < \U X \ < 1 and \D Y \ < 2. Furthermore, if Dy = 
{/(E), /(E 2 )}, then a* < /(X i) + /(X 2 ) and one of f(X-i) and /(X 2 ) is the largest bad 

choice in fx ■ 

Proof. Assume a* > /(X i) + /(X 2 ), then /(X i) + /(E i) > a* > /(Xj) + /(X 2 ). So 

/(Xj) > /(X 2 ). On the other hand, /(X 2 ) + /(E) > a* > a 2 ^ /(E) + /(E)- So 

/(X 2 ) > /(E)- A contradiction! 

We notice that when a* < /(Xi) + /(X 2 ), /(Xj) and /(X 2 ) are in the different columns, 
and one of them, say, /(X 1 ), has to be the largest bad choice in f x . ■ 


Lemma 7 If\B x \ = 2, then \D X \ < 1 and \Dy\ < 1. Furthermore, if D x = {/(Xi)}, then 
= {/(E)}; if Dy = {/(E)} and /(X 1 ) £ B x , then /TO is in ifce right column of the 

assignment diagram of fx ■ 
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Proof. Suppose that f(X{) and f(X 3 ) are two bad choices in fx- First, we notice that if 
f(Xi) (/ = i or j) is a disastrous choice, /(VJ) is also a disastrous choice, because f(Xi) + 
f(Y { ) >a* >a* A > f(Xi) + and therefore f(Yi) > f > f . 

Assume that there are at least two disastrous choices in fx- They must be f(Xi) and 
f(Xj). Since both f(Y t ) and f(Yj) are disastrous choices, they are in the same column in 
assignment diagram of fy. Then /( X t ) + f(Yi) > a* > > f(Yi) + f{Yj). So f{X t ) > 

On the other hand, f(X 3 ) + f(Y 3 ) > a* > a? > f(X l ) + f(X 3 ). So f(Y 3 ) > f(X l ). 
A contradiction! 

Assume that there are at least two disastrous choices in /V, say, f{Y\) and /( V2) • Then 
f(X 1) + f(Y x ) > a* > 0^ > f(Y x ) + f(Y 2 ). So f(X x ) > f(Y 2 ). On the other hand, 
f(X 2 ) + f(Y 2 ) > a* > a\ > f(Xi ) + f{X 3 ) > /(X,) + f(X 2 ). So f(Y 2 ) > /(*,). A 
contradiction! 

If D x = {/(*i)}, then Dy = {/(Y,)}. If Dy = {/(Y,)} and /(X,) £ B x , then /(*,) 
must be in the right column. Otherwise, f(X x ) + f(Y\) > f(X t ) + f(Xj) + f(X\). So 
m) f(Xj) ^ 771, which is impossible. I 
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